9 research outputs found

    Optimal Schedules for Parallel Prefix Computation with Bounded Resources

    No full text
    Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x k , for 1 k N , with associative operation ffi. We show optimal schedules for parallel prefix computation with a fixed number of resources p 2 for a prefix of size N p(p + 1)=2 . The time of the optimal schedules with p resources is d2N=(p + 1)e for N p(p + 1)=2, which we prove to be the strict lower bound(i.e., which is what can be achieved maximally). We then present a pipelined form of optimal schedules with d2N=(p + 1)e + d(p 0 1)=2e 0 1 time, which takes a constant overhead of d(p 0 1)=2e time more than the optimal schedules. Parallel prefix is an important common operation in many algorithms including the evaluation of polynomials, general Hornor expressions, carry look-ahead circuits and ranking and packing problems. A most important application of parallel prefix is loop parallelizing transformation. 1 Introduction Given x 1 ; . . . ; xN , parallel prefix computes x 1 ffi x 2 ffi . . . ffi x..

    Speedup of Band Linear Recurrences in the Presence of Resource Constraints

    No full text
    An m-th order linear recurrence system of N equations computes x i = c i + P j=i0m i01 a ij x j for 1 i N . Linear recurrences have a role of central importance in computer design, numerical analysis, program analysis, digital signal processing and many non-numerical algorithms. However, programs containing band linear recurrences are difficult to significantly parallelize due to loop-carried dependences. We present a new method for systematically approaching the optimal parallel schedules for computing mth-order linear recurrences with a fixed number of processors p independent of problem size N . Using our method, we first derive two kinds of parallel schedules, called the pipelined schedules and the exact schedules, for parallel evaluation of band linear recurrences. Our schedules have better execution times than the fastest previously published parallel schedules for p ? m 1. In particular, the exact schedules achieve an execution time of (2m 2 + 3m)N p + (m(m+1)(2m+1)) 2..

    Computing Programs Containing Band Linear Recurrences on Vector Supercomputers

    No full text
    Many large-scale scientific and engineering computations, e.g., some of the Grand Challenge problems [1], spend a major portion of execution time in their core loops computing band linear recurrences (BLR's). Conventional compiler parallelization techniques [4] cannot generate scalable parallel code for this type of computation because they respect loop-carried dependences (LCD's) in programs and there is a limited amount of parallelism in a BLR with respect to LCD's. For many applications, using library routines to replace the core BLR requires the separation of BLR from its dependent computation, which usually incurs significant overhead. In this paper, we present a new scalable algorithm, called the Regular Schedule, for parallel evaluation of BLR's. We describe our implementation of the Regular Schedule and discuss how to obtain maximummemory throughput in implementing the schedule on vector supercomputers. We also illustrate our approach, based on our Regular Schedule, to parallel..

    High-Level Synthesis of Scalable Architectures for IIR Filters Using Parameterized MCM's

    No full text
    We describe the high-level synthesis of scalable 1 parallel architectures implementing infiniteimpulse response (IIR) filters using multi-chip module (MCM). Our approach is based on a new class of parallel schedules for computing mth-order IIR filters, called regular schedules. The simplicity of the regular schedules facilitates characterization of their inter-processor communications, which is generally difficult to express for parallel algorithms. The characterization of inter-processor communications of the regular schedules enables us to generate instruction-level behavior of the design that can be easily mapped onto MCM-based architectures. We illustrate the use of the regular schedule in algorithmic-level synthesis of MCM-based parallel application-specific processors implementing the fifth-order elliptic wave filter benchmark. Our approach yields a scalable performance measured in the filter's sample rate on both multiple-bus architectures and mesh architectures, which is not ..

    High-Level Synthesis of Scalable Architectures for IIR Filters Using Multichip Modules

    No full text
    We present a new technique for the high-level synthesis of scalable 1 MCM-based architectures implementing infiniteimpulse response(IIR) filters. Our technique is based on the regular schedules, a class of parallel schedules for computing mth-order IIR filters. The simplicity of the regular schedules facilitates characterization of their inter-processor communications, which is generally difficult to express for parallel algorithms. The characterization of inter-processor communications of the regular schedules enables us to generate instruction-level behavior of the design that can be easily mapped onto MCMbased architectures. We illustrate this mapping of the regular schedules onto an MCM-based architecture by designing a special-purpose processor for the fifth-order elliptic wave filter. Our design yields a scalable performancemeasured in the filter's sample rate, which is not known to have been achieved by previously published designs. This work differs significantly from "trad..
    corecore